A Performability Model for Applications using Checkpointing

نویسنده

  • John P. Dougherty
چکیده

An analytical model is used to investigate the effects of checkpointing on the performance and availability of sequential and parallel applications. Known as Steady-State Performability (SSP), this model provides a probabilistic method for quantifying delivered performance considering failure and recovery. Input parameters describe both the distributed application and the processing environment. Terms quantify computation effort, as well as overheads for communication, synchronization and fault tolerance. By unifying performance and availability, fault tolerance overheads can be justified. The model emphasizes simplicity over detail in hopes of guiding the application developer through design and into implementation. Key Terms: reliability and performance modeling, parallel and distributed systems. 1.0 INTRODUCTION Typically, performance and fault tolerance are placed into separate categories. For example, topics such as parallel processing, RISC, and compiler optimization fall into the performance category; on the other hand, triple-modular redundancy, checkpointing and forward recovery fall into the fault tolerance category. In many cases, these two aspects in computer science are considered orthogonal [13]. Performability extends performance to include fault tolerance issues by identifying characteristics of computer applications which effect the delivered performance considering failures. In this manuscript, distributed applications are studied which operate upon a input set of values with a given size. The size of the input set may not be evident at the onset of execution, but is considered fixed at the instant execution begins. Since the input size and the processing environment are fixed for a given execution, then the expected time of computation is also considered fixed. This computation time is split into checkpoint intervals of equal length. At the end of each interval a checkpoint is performed. A rollback and recovery model [15] is assumed for recovery from process failure within an application.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multiprocessor System with Non-Preemptive Earliest-Deadline-First Scheduling Policy: A Performability Study

This paper introduces an analytical method for approximating the performability of a firm realtime system modeled by a multi-server queue. The service discipline in the queue is earliestdeadline- first (EDF), which is an optimal scheduling algorithm. Real-time jobs with exponentially distributed relative deadlines arrive according to a Poisson process. All jobs have deadlines until the end of s...

متن کامل

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

Defining the Checkpoint Interval for Uncoordinated Checkpointing Protocols

Parallel applications running on large computers suffer from the absence of a reliable environment. Fault tolerance proposals, in general, rely on rollback-recovery strategies supported by checkpoint and/or message logging. There are well-defined models that address the optimum checkpoint interval for coordinated checkpointing. Nevertheless, there is a lack of models concerning uncoordinated ch...

متن کامل

Multiprocessor Performability Analysis

Conclusions Performability models of multiprocessor systems and their evaluation are presented. Two cases in which hierarchical modeling is applied are examined. 1. Models are developed to analyze the behavior of processor arrays of various sizes in the presence of permanent, transient, intermittent, and near-coincident faults. Models can be generated for typical reconfiguration schemes that co...

متن کامل

Performability Evaluation Low-Powered Sensor Node by Stochastic Model Checking

Wireless Sensor Network (WSN) applications, there may be many low-powered sensor nodes which can communicate with each other by wireless techniques. Due to limited power supply, the satisfiability of performability properties, which include energy constraints especially, must be confirmed in design phase. This will help one to avoid implementing impracticable designs. A typical performability p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996